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Abstract 

In a stratified analysis, the results from different strata if homogeneity assumption is met are pooled together to obtain a 
single summary estimate for the common effect parameter. However, the effect can appear homogeneous across strata 
using one measure but heterogeneous using another. Consequently, two researchers analyzing the same data can arrive at 
conflicting conclusions if they use different effect measures. In this paper, the author draws on the sufficient component 
cause model to develop a stratified-analysis method regarding a particular effect measure, the 'peril ratio'. When there is no 
mechanistic interaction between the exposure under study and the stratifying variable (i.e., when they do not work together 
to complete any sufficient cause), the peril ratio is constant across strata. The author presents formulas for the estimation of 
such a common peril ratio. Three real data are re-analyzed for illustration. When the data is consistent with peril-ratio 
homogeneity in a stratified analysis, researchers can use the formulas in this paper to pool the strata. 
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Introduction 

A central issue in epidemiology is characterizing the relationship 
between exposure and disease. As many other factors may 
confound or modify the effect of exposure under study, 
epidemiologists often need to perform a stratified analysis of these 
confounders/modifiers. On the one hand, if heterogeneity of 
exposure effects are present (i.e., the effects are different across 
strata), then we report the stratum-specific estimates separately. 
On the other hand, if the data is consistent with homogeneity, then 
we pool the results from different strata to obtain a single summary 
estimate for the common effect parameter of the exposure [1]. 

However, the effect of an exposure can either be measured in a 
ratio scale, e.g., risk ratio, odds ratio and rate ratio, or in a 
difference scale, e.g., risk difference, odds difference and rate 
difference [1]. No one scale is better than the others and so 
universally endorsed. Worse, an effect can appear homogeneous 
across strata when using one measure and heterogeneous when 
using another. Consequendy, if using different measure, two 
researchers analyzing the same data can arrive at conflicting 
conclusions, which is certainly undesirable. 

The sufficient component cause model [1-13] can help to 
resolve this conflict. A sufficient cause contains a combination of 
component causes. There may be many classes of sufficient causes 
for a disease, and any class with all of its components completed is 
sufficient to cause the disease. When there is no mechanistic 
interaction between the exposure under study and the stratifying 
variable (i.e., when they do not work together to complete any 
sufficient cause), a particular effect measure, the 'peril ratio', will 
be constant across strata [13]. In this paper, I present formulas for 



the estimation of such a common peril ratio. Three real data will 
be re-analyzed for illustration. 

Methods 

Consider a dichotomous exposure and disease in the follow-up 
of a population in a certain time interval. The exposure status is 
assumed to be time-invariant, and the follow-up, to be without loss 
to follow up and competing death. A stratified analysis is to be 
performed based on a stratification variable with a total of 
K(i= \,2,...,K) strata. Table 1 presents the data layout for the z 
stratum. 

The peril ratio (PR) for the i stratum is defined as 



PR, 



Perilf 
Perilf 



IS? 



US? 



(1) 



where S? = — '-=■ is the disease-free proportion at the f stratum for 
' Nf 

the unexposed, and Sf = — is for the exposed [13]. Note that a 

peril ratio is simply a risk ratio with reverse coding for both the 
exposure and the outcome. Therefore, formulas for risk ratio [1] 
can be directly applied to the peril ratio, if reverse coding is 
properly acknowledged. A large-sample formula for the variance 
of a log peril ratio is 
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Table 1. 


Data layout for the 
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Peril ratios are to be interpreted as 'fold decreases' [13]. Assuming 
that no residual confounding exists, PR, is the fold decrease in a 
disease-free probability for a subject at the i stratum if he/she 
changes status from being unexposed to being exposed. Under the 
assumption that there is no mechanistic interaction between the 
exposure under study and the stratifying variable, the peril ratios 
are constant across strata [13]. This common peril ratio is the fold 
decrease in disease-free probability for anyone, regardless of the 
stratum, whose status changes from being unexposed to being 
exposed. 

To estimate the common peril ratio, one can use the inverse- 
variance weighted Woolf-type estimator [14]: 



PRv 



exp 



fEwfx log PR, \ 



where wf = Var" 1 (log PR, 



l 



1 
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(3) 



. The var- 



iance of the logarithm of this pooled estimate (under the large- 
stratum limiting model) is [14] 



Var logPRv 



1 



i 

Alternatively, one can use the Mantel-Haenszel estimator [15]: 



(4) 



where i + 4 



E wf K x sf 

i 



(5) 



the Mantel-Haenszel weight, is the 



harmonic sum of the unexposed (Nf) and the exposed (Nf) 
population for the i th stratum. A variance formula for the Mantel- 
Haenszel estimator, which is valid under both the large-stratum 
and sparse-data limiting models is [15] 



Table 2. Re-analysis of the mortality data from all causes for 
tolbutamide and placebo treatment groups 3 . 







Surviving 


Dead 


Total 


Peril Ratio (95% Cl b ) 


Age<55 


Placebo 


115 


5 


120 


1 .0000 


Tobultamide 


98 


8 


106 


1.0366 (0.9704-1.1072) 


Age 55+ 


Placebo 


69 


16 


85 


1 .0000 


Tobultamide 


76 


22 


98 


1.0467 (0.9030-1.2134) 


Total(Crude) 


Placebo 


184 


21 


205 


1 .0000 


Tobultamide 


174 


30 


204 


1.0523 (0.9779-1.1324) 



a from Table 15-1 in Rothman KJ, Greenland S, Lash TL (2008) Modern 

Epidemiology, 3rd ed. Philadelphia: Lippincott. 

b CI:confidence interval. 
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where Sj = ' is the disease-free proportion of the i th stratum. 

In the above, we have invoked the assumption of peril-ratio 
homogeneity (equivalendy, the assumption of no mechanistic 
interaction). In practice, this assumption needs to be checked using 
the data on hand. Here, I extend the PRISM (peril ratio index of 
synergy based on multiplicativity) test used in a previous paper 
[13] in order to deal with the present situation of K>2. First, we 

calculate 5, a (K — 1 ) x 1 column vector of the estimates of the 
logPRISMs with its ;' th element (1 <i<K) being 



logPRISM, = log 



PR, 

PR/r 



(7) 



and 2, a (K— 1) x (K— 1) variance-covariance matrix of 8 with its 
I th diagonal element (1 <i<K) being 

Var (log PRISM,) = Var (log PR,) + Var (log PR^) , (8) 
and its ;' th row and y th column (\<i=£j <K) off-diagonal element, 



(9) 



Cov log PRISM,, log PRISM,- = Var log PR A - 



respectively. Next, we calculate the following heterogeneity 
statistic (Het): 



Het = 5 , £- 1 5. 



(10) 



Asymptotically (large-stratum limiting model), Het is distributed as 
a chi-square distribution with K—l degree of freedom (df) under 
the null hypothesis of peril-ratio homogeneity (no mechanistic 
interaction). 
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If a summary effect measure is the desired end but the 
assumption of peril-ratio homogeneity fails, then one can resort to 
standardization techniques [1] in order to pool the strata. In 
general, the resulting standardized effect measures have larger 
variances. Using the total population as the standard, the 
standardized peril ratio is calculated as 



PRs 



Peril* 



1 S*> 



Peril" 



1 /JSetE £SetE 



(ID 



i 

i 

where the 'Set X' operator dictates that the exposure status of each 
and every subject in the population is set to X, and the weight, 
wf=Nf+Nf, is the population size (the arithmetic sum of the 
unexposed and the exposed, cf, the harmonic sum in Mantel- 

Haenszel weight) for the i stratum. The variance of log PRs 
under the large-stratum limiting model is 



£(«f)'xVar(^} EW) xVar(Sf 
Var(logPR s ) = — ^— + 



£ufx§f (12) 



Table 3. Re-analysis of the coronary heart disease (CHD) 
occurrence data for personality type A and B persons 3 









No CHD 


CHD 


Total 


Peril Ratio (95% Cl b ) 


Age<40 


Type B 


271 


11 


282 


1 .0000 


Type A 


241 


20 


261 


1.0407 (0.9978 ~ 1.0855) 


Age 40-44 


Type B 


574 


21 


595 


1 .0000 


Type A 


462 


34 


496 


1.0357 (1.0067 ~ 1.0655) 


Age 45-49 


Type B 


343 


21 


364 


1 .0000 


Type A 


337 


49 


386 


1.0793 (1.0311 ~ 1.1298) 


Age 50-54 


Type B 


184 


17 


201 


1 .0000 


Type A 


209 


38 


247 


1.0819 (1.0110 ~ 1.1577) 


Age 55+ 


Type B 


114 


9 


123 


1 .0000 


Type A 


162 


37 


199 


1.1385 (1.0479 ~ 1.2369) 


Total(Crude) 


Type B 


1486 


79 


1565 


1 .0000 


Type A 


1411 


178 


1589 


1.0693 (1.0472 ~ 1.0919) 


a from Table 7-24 in Selvin S (1991) Statistical Analysis of Epidemiologic Data. 



New York: Oxford University Press. 
b CI: confidence interval. 
doi:1 0.1 371 /journal.pone.0086374.t003 



with Var 5 



Sfx l-S 



and Var S 



Sfx l-Sf 



AT? "~ -~V'J 
This standardized peril ratio is what the fold decrease in the 
disease-free proportion for the entire population would be, if we, 
contra the facts, change the exposure prevalence of the population 
from 0% to 100%. 

It is worth noting that all three summary peril ratios considered 

in this paper, PRw, PRmh an d PRs, enjoy the 'collapsibility' 
property [1]; that is, the summary effect measure will fall 
somewhere in the middle of the stratum-specific measures. This 
is so because they are weighted averages of the stratum-specific 
peril ratios [PRw: geometric average with the weights propor- 
1 1 1 
N? c t Nf 



tional to | — 



PRmh : arithmetic average 



with the weights proportional to 
with the 



xSf; PR S 



arithmetic 
Nf + Nf 



average 

xsh. 



i i 

Nf + 
weights proportional 



to 



Examples 

In this section, three real data were re-analyzed in order to 
demonstrate the methodologies. 

1. Mortality Data from All causes for Tolbutamide and 
Placebo Treatment Groups 

The first example consists of randomized, controlled trial data 
comparing all-cause mortality between tolbutamide treatment and 
placebo groups, taken from Table 15-1 in the textbook Modern 
Epidemiology [1]. The stratifying variable is age (two strata: age<55 
and age 55+). Table 2 presents the peril ratios and their 95% 
confidence intervals (CIs) for the two strata. The heterogeneity 
statistic is calculated as Het = 0.0140. This is to be referred to a 
chi-square distribution with di = K — 1=2 — 1 = 1, and the 
p-value is 0.9057. 

Because the data is consistent with peril-ratio homogeneity (no 
mechanistic interaction between treatment and age), I pooled the 

two strata in order to obtain a common peril ratio: PRw = 1 .0383 
(95% CI: 0.9776-1.1027) using Woolfs method, or 

PRmh = 1-0407 (95% CI: 0.9688-1.1179) using the Mantel- 
Haenszel method. This implies a —4% reduction in survival for 
anyone, young or old, who chooses to take tolbutamide (though it 
is not significant, judging from the 95% CIs that cover the no- 
effect peril ratio of one). For this example, age is not an important 
confounder; the common peril ratio and the crude peril ratio 
(1.0523) differ very litde. This is no surprise as the data is drawn 
from a randomized controlled trial. 

2. Coronary Heart Disease Occurrence Data for 
Personality Type A and B Persons 

The second example consists of cohort data comparing the 
occurrence of coronary heart disease (CHD) between personality 
type A and B persons, taken from Table 7-24 in the textbook 
Statistical Analysis of Epidemiologic Data [16]. The stratifying variable 
is age (a total of five strata). Table 3 presents the peril ratios and 
their 95% CIs for the five strata. 

Again, the data is consistent with peril-ratio homogeneity 
(Het = 6.7357 and p-value = 0.1 505, based on a chi-square 
distribution with df = J?— 1=5 — 1=4). I then pooled the five 
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Table 4. Re-analysis of the subsequent cocaine use data for exposed twin members (early marijuana users) and their unexposed 
co-twins (non marijuana users) 3 . 





Early Marijuana Use 


Cocaine Use at a Later Age 


Total 
Subjects 






No 


Yes 




Neither Has the Outcome (Pairs = 141) 


No 


1 


0 


1 




Yes 


1 


0 


1 


Exposed Member Has the Outcome (Pairs = 88) 


No 


1 


0 


1 




Yes 


0 


1 


1 


Unexposed Member Has the Outcome (Pairs = 21} 


No 


0 


1 


1 




Yes 


1 


0 


1 


Both Have the Outcome (Pairs = 61) 


No 


0 


1 


1 




Yes 


0 


1 


1 


Total (Pairs = 311) 


No 


229 


82 


311 




Yes 


162 


149 


311 



a from Cummings P, McKnight B (2004) Analysis of matched cohort data. Stata J 4: 274-281. 
doi:1 0.1 371 /journal.pone.0086374.t004 



strata in order to obtain a common peril ratio: PR\v = 1.0538 
(95% CI: 1.0335-1.0744) using Woolfs method, or 

PR MH = 1.0622 (95% CI: 1.0407-1.0842) using the Mantel- 
Haenszel method. This implies a 5—6% reduction in CHD-free 
probability for a type A person when compared with a type B 
person of the same age. This reduction is significant, judging from 
the 95% CIs that do not cover the no-effect peril ratio of one. 

3. Subsequent Cocaine Use Data for Early Marijuana 
Users and Non-Users 

The final example consists of twin follow-up data for subsequent 
cocaine use comparing exposed twin members (early marijuana 
users) with their unexposed co-twins (non marijuana users), and is 
taken from the paper of Cummings and McKnight [17]. Treating 
each twin pair (a total of 3 1 1 pairs) as one separate stratum, the 
data can be presented in a total of 31 1 'tables' (see Table 4). 

This matched-pair data is in accord with the sparse-data 
limiting model; there are only two subjects in each stratum but the 
total number of strata is large. Therefore, I applied the Mantel- 
Haenszel method in order to pool the strata: PRmh = 1.4136 
(95% CI: 1.2711-1.5720). (Both Woolfs and the standardization 
method rely on the large-stratum limiting model and are not 
applicable to this example. The heterogeneity test also relies on the 
large-stratum limiting model. Therefore, in this example, the 
assumption of no mechanistic interaction has to be invoked if the 
strata are to be pooled, but the assumption by itself is not 
amenable to testing.) This implies a significant — 1 .4 fold decrease 
in cocaine-naive probability in later years for an early marijuana 
user when compared with his/her non-marijuana-using co-twin. 
Ignoring the paired structure of the data, the crude peril ratio for 
this example is the same as PRmh but the variance is larger 
[Var(logPR crudi: ) =0.0041 > Var(logPR M H-) =0.0029]. 
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exceedingly low, a log peril ratio can be approximated by the 
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common risk difference [1]. For a case-control study of a rare 
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